Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Paul Ginsparg

Cornell University

Scientific production in the era of Large Language Models

Jan 19, 2026

Keigo Kusumegi, Xinyu Yang, Paul Ginsparg, Mathijs de Vaan, Toby Stuart, Yian Yin

Abstract:Large Language Models (LLMs) are rapidly reshaping scientific research. We analyze these changes in multiple, large-scale datasets with 2.1M preprints, 28K peer review reports, and 246M online accesses to scientific documents. We find: 1) scientists adopting LLMs to draft manuscripts demonstrate a large increase in paper production, ranging from 23.7-89.3% depending on scientific field and author background, 2) LLM use has reversed the relationship between writing complexity and paper quality, leading to an influx of manuscripts that are linguistically complex but substantively underwhelming, and 3) LLM adopters access and cite more diverse prior work, including books and younger, less-cited documents. These findings highlight a stunning shift in scientific production that will likely require a change in how journals, funding agencies, and tenure committees evaluate scientific works.

* Science, 390(6779), pp.1240-1243 (2025)
* This is the author's version of the work. The definitive version was published in Science on 18 Dec 2025, DOI: 10.1126/science.adw3000. Link to the Final Published Version: https://www.science.org/doi/10.1126/science.adw3000

Via

Access Paper or Ask Questions

Attention-based Quantum Tomography

Jun 22, 2020

Peter Cha, Paul Ginsparg, Felix Wu, Juan Carrasquilla, Peter L. McMahon, Eun-Ah Kim

Figure 1 for Attention-based Quantum Tomography

Figure 2 for Attention-based Quantum Tomography

Figure 3 for Attention-based Quantum Tomography

Figure 4 for Attention-based Quantum Tomography

Abstract:With rapid progress across platforms for quantum systems, the problem of many-body quantum state reconstruction for noisy quantum states becomes an important challenge. Recent works found promise in recasting the problem of quantum state reconstruction to learning the probability distribution of quantum state measurement vectors using generative neural network models. Here we propose the "Attention-based Quantum Tomography" (AQT), a quantum state reconstruction using an attention mechanism-based generative network that learns the mixed state density matrix of a noisy quantum state. The AQT is based on the model proposed in "Attention is all you need" by Vishwani et al (2017) that is designed to learn long-range correlations in natural language sentences and thereby outperform previous natural language processing models. We demonstrate not only that AQT outperforms earlier neural-network-based quantum state reconstruction on identical tasks but that AQT can accurately reconstruct the density matrix associated with a noisy quantum state experimentally realized in an IBMQ quantum computer. We speculate the success of the AQT stems from its ability to model quantum entanglement across the entire quantum system much as the attention model for natural language processing captures the correlations among words in a sentence.

Via

Access Paper or Ask Questions

Text Segmentation based on Semantic Word Embeddings

Mar 18, 2015

Alexander A Alemi, Paul Ginsparg

Figure 1 for Text Segmentation based on Semantic Word Embeddings

Figure 2 for Text Segmentation based on Semantic Word Embeddings

Figure 3 for Text Segmentation based on Semantic Word Embeddings

Figure 4 for Text Segmentation based on Semantic Word Embeddings

Abstract:We explore the use of semantic word embeddings in text segmentation algorithms, including the C99 segmentation algorithm and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iterative refinement technique for improving the performance of greedy strategies. We compare our results to known benchmarks, using known metrics. We demonstrate state-of-the-art performance for an untrained method with our Content Vector Segmentation (CVS) on the Choi test set. Finally, we apply the segmentation procedure to an in-the-wild dataset consisting of text extracted from scholarly articles in the arXiv.org database.

* 10 pages, 4 figures. KDD2015 submission

Via

Access Paper or Ask Questions

Mapping Subsets of Scholarly Information

Dec 11, 2003

Paul Ginsparg, Paul Houle, Thorsten Joachims, Jae-Hoon Sul

Figure 1 for Mapping Subsets of Scholarly Information

Figure 2 for Mapping Subsets of Scholarly Information

Figure 3 for Mapping Subsets of Scholarly Information

Figure 4 for Mapping Subsets of Scholarly Information

Abstract:We illustrate the use of machine learning techniques to analyze, structure, maintain, and evolve a large online corpus of academic literature. An emerging field of research can be identified as part of an existing corpus, permitting the implementation of a more coherent community structure for its practitioners.

* 10 pages, 4 figures, presented at Arthur M. Sackler Colloquium on "Mapping Knowledge Domains", 9--11 May 2003, Beckman Center, Irvine, CA, proceedings to appear in PNAS

Via

Access Paper or Ask Questions